Information Visualisation Project#
# Imports
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import statistics as st
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
import matplotlib.patches as mpatches
Aggregating the data
# Import the Dataset
data = pd.read_csv('spotify_songs.csv', sep = ',')
# Creating a second dataframe for popular tracks only
data_popular = data[data['track_popularity'] > 90]
def usable_date(dataframe):
if dataframe.count('-') == 0:
return dataframe + '-01-01'
else: return dataframe
# Create useable date
data['track_album_release_date'] = data['track_album_release_date'].apply(usable_date)
data['track_album_release_date'] = pd.to_datetime(data['track_album_release_date'], format = '%Y-%m-%d', errors = 'coerce')
# Dropping unused columns for optimisation
data = data.drop(['track_id', 'track_album_id', 'track_album_name', 'playlist_name', 'playlist_id', 'duration_ms'], axis = 1)
Part 1: All popular songs are the same#
In part 1 of this study, the data story unfolds with a specific perspective: the claim that popular music exhibits homogeneity. The hypothesis posits that the commercial nature of music releases has led to a standardized formula for achieving rapid success, resulting in songs that share common characteristics of being loud, energetic, and repetitive.
To investigate this claim, an analysis is initiated by examining the genre distribution across the entire dataset. A pie chart visualization is employed to identify patterns, similarities, and disparities among genres.
Furthermore, a second pie chart is utilized to focus specifically on the genre distribution within the subset of highly popular songs, defined by a track_popularity rating of 90 or higher. This examination aims to provide insights into the prevalence and composition of genres within the realm of popular music.
# Pie chart for the complete dataset
fig = px.pie(data,
values = 'track_popularity',
names='playlist_genre',
title='Distribution of genres for the entire dataset',
hole = 0.8,
color_discrete_sequence = px.colors.qualitative.T10,)
# Update layout for no legend and better height
fig.update_layout(showlegend=False,
height = 400,)
# Update traces for textposition and textinfo
fig.update_traces(textposition = 'outside',
textinfo='label+percent')
fig.show()
This pie chart illustrates the genre distribution in the entire dataset. Pop dominates with 24.5%, followed closely by rock, rap, and R&B, each hovering around 18-20%. Latin accounts for 12.5%, while EDM represents 8.75% of the songs. This showcases the diversity of genres in the entire dataset. But when specifically looking at popular songs, there is a different distribution visible.
# Pie chart for the top 10% most popular songs in the dataset
fig = px.pie(data_popular,
values = 'track_popularity',
names='playlist_genre',
title='Distribution of genres for popular songs',
hole = 0.8,
color_discrete_sequence = px.colors.qualitative.T10,)
# Update layout for no legend and better height
fig.update_layout(showlegend=False,
height = 400)
# Update traces for textposition and textinfo
fig.update_traces(textposition = 'outside',
textinfo='label+percent')
fig.show()
The second pie chart reveals the genre distribution among popular songs. Here, the dominance of Pop with 72.9% suggests a prevalence of mainstream, commercially-driven music. The comparatively lower percentages for R&B (2.96%), Latin (6.11%), and Rap (18%) furthermore indicate a lack of diversity and support the notion that popular songs today tend to conform to a formulaic approach.
Genre |
In complete dataset |
In popular dataset |
Difference |
|---|---|---|---|
pop |
24.5% |
72.9% |
+ 48.4% |
rap |
18.7% |
18% |
- 0.7% |
latin |
12.5% |
6.11% |
- 6.39% |
r&b |
16.6% |
2.96% |
- 5.1% |
rock |
18.9% |
0% |
- 18.9% |
edm |
8.75% |
0% |
- 8.75% |
The table above compares genre proportions between the entire dataset and the popular dataset. Pop stands out with a massive increase of 48.4%, while other genres, such as Rock and EDM, decrease significantly. This genre distribution supports the notion that popular music today follows a similar formula, contributing to a perceived similarity in popular songs.
Moving forward, the investigation progresses to a deeper exploration of individual song attributes. Multiple visualizations were employed to analyze attributes such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, and tempo. Through these comprehensive examinations, valuable insights and discernible patterns emerged, shedding light on the distinctive characteristics and potential similarities exhibited among these attributes within the realm of popular music.
# Radar chart
# Categories
categories = ['danceability', 'energy', 'liveness', 'speechiness', 'valence', 'instrumentalness', 'acousticness']
# Grab the top 10 most popular songs
top10 = data_popular.sort_values(by = 'track_popularity', ascending = False).head(10)
colors = ['#1f77b4', '#ff7f0e', 'purple', '#d62728', 'brown', '#8c564b', 'black', 'magenta', '#bcbd22', '#17becf']
# Put the names and values of the Top 10 variables
labels = top10['track_name']
values = top10[categories].values
# Create figure
fig = go.Figure()
# Create Scatterpolar (radar chart)
# Use zip function for two variables in the same loop
for i, j, color in zip(labels, values, colors):
fig.add_trace(go.Scatterpolar(
r = j,
theta = categories,
name = i,
marker = {'color': color},
fill = 'toself',
opacity = 0.5,
))
# Update layout for axis range of 0 to 1
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 1]
)),
title = 'Distribution of Audio Features for top 10 most popular songs')
fig.show()
This Radar Chart visually represents the audio features of popular songs. Most songs exhibit concentrated shapes in the middle, indicating similar attributes like high valence and danceability but also low liveness and instrumentalness. This observation supports the claim that popular songs tend to have similar attributes, contributing to their perceived similarity in sound.
# Scatter plot for the acousticness
# Grab the 100 most popular songs
fig = px.scatter(data.sort_values(by='track_popularity', ascending = False).head(100),
x = 'instrumentalness',
y = 'acousticness',)
# Update layout for title
fig.update_layout(title = 'Analysis of acousticness and instrumentalness for the top 100 most popular songs')
fig.show()
Continuing the investigation, a deeper analysis of popular songs’ attributes is conducted using a scatter plot. Building upon the observation that popular songs tend to have low instrumentalness, we specifically analyze the relationship between instrumentalness and acousticness. Through trial and error, these two attributes were chosen due to their potential influence on song popularity.
The resulting scatter plot showcases a distinct pattern where the majority of dots align closely along the 0 line on the x-axis, indicating low instrumentalness. Additionally there is a slight trend towards lower values for acousticness. This reaffirms the notion that popular songs share similar musical characteristics, particularly in terms of instrumentalness, providing further evidence of the perceived similarity in popular music.
Part 2: Not all popular songs are the same#
In part 2, the data story is conveyed from the perspective that “All popular songs do not sound the same, the rise of the internet has given birth to a big diversity of music.”
This analysis begins by examining the genres to identify broad similarities and differences. The focus then shifts to investigating the specific attributes of popular songs, aiming to determine if there is evidence to support the notion of increased diversity in popular music.
# Histogram
fig = px.histogram(data, x = 'track_album_release_date', color = 'playlist_genre', labels={'track_album_release_date': 'Release Date', ' ': 'Songs released', 'playlist_genre': 'Genre'})
# Update layout for title and yaxis rename
fig.update_layout(title = 'Distribution of genres over time', yaxis = {'title': 'Amount of songs released'})
fig.show()
This scatter plot reveals the dynamic landscape of popular music over time, showcasing the emergence and popularity of diverse genres. In the earlier years (around 1950-1990), rock and pop music prevailed. However, as time progresses, a proliferation of genres such as EDM, Rap and Latin music has been observed, indicating a broader spectrum of popular musical expressions. This visualization strongly supports the claim that the rise of the internet has fostered a remarkable diversity of music, challenging the notion that all popular songs sound the same.
To present an accurate depiction of the popularity of various genres, the genres have been further classified into four distinct subgenres. The Sunburst Chart below offers a visual representation of these subgenres. It is important to note that this visualization does not represent the exact proportions of genres in the entire dataset. Rather, it serves as a visual exploration of the different subgenres present in the dataset.
genres = ['', 'genres', 'genres', 'genres', 'genres', 'genres', 'genres', 'rock', 'rock', 'r&b', 'r&b', 'pop', 'r&b', 'edm', 'r&b', 'latin', 'pop', 'rap',
'rock', 'pop', 'rap', 'latin', 'rap', 'latin', 'pop', 'edm', 'edm', 'latin', 'rap', 'rock', 'edm']
subgenres = ['genres', 'rock', 'edm', 'pop', 'rap', 'r&b', 'latin', 'classic rock', 'hard rock', 'new jack swing', 'neo soul', 'dance pop', 'urban contemporary', 'big room', 'hip pop', 'latin pop', 'indie poptimism', 'gangster rap', 'album rock', 'post-teen pop', 'trap', 'latin hip hop', 'southern hip hop', 'tropical', 'electropop', 'progressive electro house', 'pop edm', 'reggaeton', 'hip hop', 'permanent wave', 'electro house']
value = [18454, 3521, 2045, 3993, 3391, 3326, 2178, 924, 926, 881, 1001, 850, 936, 334, 508, 594, 1288, 865, 828, 891, 679, 673, 1158, 473, 964, 652, 575, 438, 689, 843, 484]
sunburst_data = pd.DataFrame({'playlist_genre': genres, 'playlist_subgenre': subgenres, 'value': value})
# Create the sunburst chart
fig = go.Figure(go.Sunburst(
labels=sunburst_data['playlist_subgenre'],
parents=sunburst_data['playlist_genre'],
))
fig.update_traces(hoverinfo='label+value')
fig.update_layout(title='Sunburst Chart of Genres',
height=600,
width=800)
fig.show()
This chart illustrates the presence of distinct subgenres like “reggaeton” and “latin hip hop”. These subgenres, despite belonging to the broader category of Latin music, have their own unique sonic identities and distinct fan bases. The presence of this diverse array of subgenres, challenges the notion of homogeneity within genres, emphasizing the dynamic and evolving nature of music.
However, despite the diverse array of subgenres within genres, the chart also reveals some interesting similarities among subgenres. Particularly noteworthy are the similarities found in the names of various subgenres. For example, the resemblance between “hip hop” in rap and “hip pop” in R&B: these subgenres share strikingly similar names, despite not being commonly interchangeable category designations.
Additionally, it is noteworthy that multiple subgenres include the term “pop” even though they do not fall under the specific genre category of “pop.” In this context, the term “pop” serves as an abbreviation for “popular,” indicating the prevalent nature of these subgenres within their actual genre classifications. The bar chart below effectively visualizes the popularity of each subgenre.
# Define the color mapping dictionary for genre groups
color_mapping = {
'#AF69EE': ['latin', 'reggaeton', 'tropical', 'latin pop', 'latin hip hop'],
'#FF5349': ['r&b', 'hip pop', 'neo soul', 'urban contemporary', 'new jack swing'],
'#187bcd': ['rap', 'trap', 'hip hop', 'gangster rap', 'southern hip hop'],
'#0ff0fc': ['rock','album rock', 'permanent wave', 'classic rock', 'hard rock'],
'#ffaf7a': ['pop', 'dance pop', 'post-teen pop', 'electropop', 'indie poptimism'],
'#41dc8e': ['edm', 'big room', 'electro house', 'pop edm', 'elm', 'progressive electro house']
}
# Group the data by subgenre and calculate the average popularity
avg_popularity_by_subgenre = data.groupby(by = 'playlist_subgenre')['track_popularity'].mean()
# Sort the subgenres by average popularity in descending order
sorted_subgenres = avg_popularity_by_subgenre.sort_values(ascending=False)
# Assign colors to subgenres based on genre groups
subgenre_colors = []
for subgenre in sorted_subgenres.index:
for color, genres in color_mapping.items():
if subgenre in genres:
subgenre_colors.append(color)
break
plt.figure(figsize=(10, 6))
plt.barh(range(len(sorted_subgenres)), sorted_subgenres, color=subgenre_colors)
plt.yticks(range(len(sorted_subgenres)), sorted_subgenres.index)
plt.title('Average Popularity by Subgenre')
plt.xlabel('Average Popularity')
plt.ylabel('Subgenre')
# Create custom legend handles and labels
legend_handles = [mpatches.Patch(color=color, label=genres[0]) for color, genres in color_mapping.items()]
legend_labels = [genres[0] for color, genres in color_mapping.items()]
plt.legend(handles=legend_handles, labels=legend_labels, loc='upper right')
plt.tight_layout()
plt.show()
This chart reveals intriguing insights about the difference of looking at genres or subgenres. While pop music is the most listened to genre, hip hop, a subgenre of rap, is actually the most listened to category. Notably, it is closely followed by two prominent subgenres within the realm of pop music, namely dance pop and post-teen pop, highlighting the significant influence and vast appeal of pop music as a whole.
However, what makes this chart even more intriguing is the broad spectrum of popular subgenres. While pop music dominates in terms of overall popularity, the chart showcases a diverse range of subgenres that enjoy substantial listenership. Particularly, the various subgenres within the Latin music category exhibit strikingly similar positions in terms of popularity, indicating a consistent and dedicated fan base for Latin music as a whole.
In contrast, the rock subgenres exhibit a more dispersed distribution throughout the bar chart. This observation suggests a wider range of popularity and fan preferences within the rock genre, with different subgenres resonating differently among listeners. This highlights the significant variations in popularity that exist even within a single genre, underscoring the diversity and distinctiveness of rock music.
Overall, this chart serves as a reminder that there are notable differences in the popularity of different genres and subgenres of music. It demonstrates that examining music solely through broad genre classifications may overlook the rich tapestry of popularity dynamics that exist within each genre.
# Select wanted variables
variables = ['danceability', 'energy', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']
# Create empty DataFrame
correlation_data = {}
# Fill in dictionary with correlations
for i in variables:
corr, pvalue = pearsonr(data_popular[i], data_popular['track_popularity'])
correlation_data[i] = corr
# Convert dictionary to DataFrame because i couldnt insert it directly ??
correlation = pd.DataFrame(correlation_data, index = ['correlation']).round(2)
# Create plot
plt.figure(figsize=(19,3))
sns.heatmap(correlation, cmap='coolwarm', annot = True, linewidths = 0.5, vmin = -1, vmax = 1)
sns.set(font_scale = 1.2)
# Insert labels and title
plt.xlabel('Audio Features')
plt.ylabel('Correlation with Popularity')
plt.title('Pearson Correlation of Audio Features with Popularity')
plt.show()
Lastly, this visualization presents a Pearson correlation heatmap, which reveals the correlation coefficients between each specific audio feature and track popularity. The results indicate that the correlations are generally weak, with coefficients ranging from -0.22 to 0.22. This suggests that there is no strong evidence to support a significant correlation between these specific song attributes and popularity. However, it is still important to note that correlation coefficients alone may not capture the full complexity of the relationship between these variables.